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Abstract. The problem of (approximately) counting the number of tri- 
angles in a graph is one of the basic problems in graph theory. In this pa- 
per we study the problem in the streaming model. We study the amount 
of memory required by a randomized algorithm to solve this problem. 



C/2 ' In case the algorithm is allowed one pass over the stream, we present a 

^ best possible lower bound of Q{m) for graphs G with m edges on n ver- 

tices. If a constant number of passes is allowed, we show a lower bound 
of Q{m/T), T the number of triangles. We match, in some sense, this 
^ , lower bound with a 2-pass 0(m/T'^' ^)-memory algorithm that solves the 

00 ' problem of distinguishing graphs with no triangles from graphs with at 

lO \ least T triangles. We present a new graph parameter p{G) - the triangle 

\l . density, and conjecture that the space complexity of the triangles prob- 

lem is 0{m/ p(G)). We match this by a second algorithm that solves the 
^^ distinguishing problem using 0(m/p(G))-memory. 
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1 Introduction 

Counting the number of triangles in a graph G = (V, E) is one of the 
basic algorithmic questions in graph theory. From a theoretical point of 
view, the key question is to determine the time and space complexity 
of the problem. The brute-force approach enumerates all possible triples 
of nodes (taking 0{rv') time, n is the number of vertices in G). The 
algorithms with the lowest time complexity for counting triangles rely 
on fast matrix multiplication. The asymptotically fastest algorithm to 
date is 0(n^'^^^) [7]. An algorithm that runs in time 0{m}'^^) with space 
complexity 0{v?) is given in [2] (m is the number of edges in G). In 
more practical applications, the number of triangles is a frequently used 
network statistic in the exponential random graph model |15|9) . and nat- 
urally appears in models of real- world network evolution [14], and web 
applications |4|8| . In the context of social networks, triangles have a nat- 
ural interpretation: friends of friends tend to be friends [18], and this can 
be used in link recommendation/friend suggestion {16j . 

The memory restrictions when dealing with huge graphs led to con- 
sider the streaming model: The edges of the graph come down the stream, 
and the algorithm processes each edge as it comes in an on-line fashion 
(once it moves down the stream, it cannot access it again). The algorithm 
is allowed ideally one pass (or a limited number of passes) over the stream. 
The parameter of interest is the amount of memory that the algorithm 
uses to solve the problem. 

Definition 1. Triangles(c) is the problem of approximating the num- 
ber of triangles in the input graph within a multiplicative factor of 0.9, 
with probability at least 2/3, using at most c passes over the data stream. 

The choice of constants 0.9 and 2/3 in the definition is for the sake of 
clear and simple presentation. One can take both the approximation rate 
and success probability to be parameters of the problem. 

Currently, no non-trivial algorithms are known to solve TRIANGLES (c) 
when c is constant (by trivial we mean an algorithm that uses 0{m) mem- 
ory, m the number of edges, which is asymptotically the same as storing 
all graph edges). All existing approximation algorithms receive T3 (the 
number of triangles in the input graph) as part of their input. Obviously, 
T3 cannot be part of the input. One way to implement such an algo- 
rithm is by "guessing" the correct value of T3 and verifying the guess. 
This translates into ©(logn) passes over the stream (in form of a binary 
search for example). Let us mention some of the known results. One ap- 
proach is a sampling approach. The algorithm suggested in [5] samples 
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in the first pass s random pairs (e, v) of an edge e = {u, w) and a vertex 
V, and stores them. Then in the second pass checks for every pair (e, v) 
whether (u, Vjw) is a triangle. The total number of triangles is estimated 
as a function of the number of pairs (e, v) that formed a triangle. The 
number of samples s is proportional to (Tq + Ti + T2)/T3 (where Ti = ^ 
of vertex triples in the graph spanning exactly i edges). They show a 
more sophisticated sampling algorithm which uses (Ti + T2)/T3 samples. 
A different approach reduces the problem of approximating the number 
of triangles to the problem of estimating the frequency moments of the 
data stream, using the Alon-Matias-Szegedy (AMS) algorithm pj. This 
approach was presented in [3] for the first time. The algorithm in [3] uses 
Ti, T2, T3 to compute the appropriate parameters with which to run AMS. 
The space complexity of this algorithm is proportional to ((Ti + r2)/T3)^. 
In another work p!3], the algorithm uses m, C4, Cg, T3 to compute the ap- 
propriate parameters to AMS (Cj is the number of i-cycles in the graph). 
The space complexity of that algorithm is (m^ + mC^ + Cq + T^)/T^. 

Observe that all aforementioned algorithms assume other non-trivial 
graph-parameters to be part of their input as well. Another disadvantage 
that the aforementioned algorithms share, is the fact that their space 
complexity depends on parameters that are not necessarily indicative of 
the number of triangles in the graph. For example, the parameter T2 may 
have little to do with the number of triangles T3 in some graphs. Consider 
the graph whose vertex set V = AqUAiUA2, each Ai of size n/3. The edge 
set E is the complete bi-partite graph on Ao,Ai and on Ai,A2- Clearly, 
G = {V, E) has no triangles, so T3 = 0, but has T2 = 0{n^) paths of 
length two. In light of what we've just said, two interesting questions 
arise 

Question 1: Determine the space complexity of Triangles(I) and 
Triangles(0(1)). 

Question 2: Is there an algorithm that solves Triangles(c), whose 
space complexity depends only on the number of edges and T3? 

Bar-Yossef et. al. [3] showed that the space complexity required to 
solve Triangles (1) is Q{in?) (throughout we disregard the memory it 
takes to represent a single graph vertex). Specifically, they showed that 
every one-pass 0.5-approximation algorithm that succeeds with proba- 
bility 0.99, is as good as the trivial algorithm that stores all edges and 
exhaustively computes the number of triangles. While the lower bound 
determines the space complexity in the worst case, it is informative to 
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study more refined notions of "worst case". For example, what is the 
space complexity of Triangles(I) when the graph has exactly m edges, 
or at least T triangles. What is the space complexity of Triangles (2), 
when two passes over the stream are allowed, rather than one. In those 
cases the lower bound in [3] is irrelevant. 

1.1 Our Results 

In this paper we show that the space complexity of any algorithm that 
solves Triangles(I) (i.e. in one pass) is Q{m). This lower bound is 
asymptotically tight, since the trivial algorithm that stores all edges of 
the graph uses that much memory. Furthermore, the lower bound is true 
even when assuming that the graph has T^ = 0{n) triangles. Clearly 
one cannot expect this to be the case for every value of T^, since when 
Tz = 0{n^) for example, a straightforward sampling algorithms solves 
Triangles(I) using 0(l)-space. Formally, 

Theorem 1. 3ci,C2 > s.t. the space complexity o/ Triangles(I) is 
Q{m), when the input is an n-vertex graph with m £ \c\n,C2V?\ edges. 
Furthermore, this lower bound is true even if the graph has as many as 
0.99n triangles. 

Theorem [1] extends the aforementioned result in [3j in two aspects: the 
number of edges is asymptotically the entire range (compared with 0{n?) 
in [3]). The graph may contain as many as a linear number of triangles 
(compared with one triangle in p]). In addition, our proof technique is 
conceptually and technically simpler. 

Improving upon the currently best known lower bound of Q{n/T-i) (n 
is the number of vertices) for Triangles(0(1)) [IE], we show that: 

Theorem 2. The space complexity o/Triangles(0(1)) for input graphs 
with m edges and T3 triangles is i7(m/max{T3, 1}). 

Turing to the algorithmic part, the lower bound for Triangles(I) is 
asymptotically tight. Giving a non-trivial upper bound for Triangles(0(1)) 
seems to be beyond the reach of current algorithmic techniques. As we 
already mentioned, all current state-of-the-art approximation algorithms 
require a super-constant number of passes (regardless of the space com- 
plexity). Hence, we start with a softer notion of approximation, in the 
spirit of property testing. 

Definition 2. Dist(c) is the following problem. Given two graph fami- 
lies: Qi consisting of triangle-free graphs, Q2 consisting of graphs with at 
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least T triangles, and an input graph G £ G1UQ2, decide whether G G Gi 
or G € Q2 with probability at least 2/3, using at most c passes over the 
input. 

The same lower bounds that we derive for Triangles (c) are true for 
Dist(c) as weh (therefore there is nothing interesting to say algorithmi- 
cally about Dist(1)). None of the aforementioned approximation algo- 
rithms solve Dist(0(1)), since they require additional parameters that 
are not available to the algorithm (for example, T2, the number of triples 
spanning exactly two edges). We describe an algorithm that solves Dist(2) 
using 0{m/T^'^) bits of memory. This answers Question 2 above, for the 
problem DiST. 

We now turn to describe in details our algorithm, and formally state 
the relevant theorem. We assume that the parameter T is known to the 
algorithm (as it is part of the problem definition). 



Algorithm A 

Output: T' ifi' a triangle was found. 
Pass 1 

(a) Set m' = 6m/T^'^, and p = m' /m. 

(b) Store every edge e with probability p. If more than 5m' edges are 
stored, output FAIL. 

(c) Let H be the graph stored by the algorithm at the end of (b). Search 
for a triangle in i7, if found output 1. 

Pass 2 For every edge e, check whether e completes a triangle in H. 
Output 1 iff such edge exists. 



Theorem 3. For T > 216, Algorithm A solves Dist(2) using at most 
30m/T^'^ bits of memory. 

Remark 1. When T = uj{l), our algorithm solves Dist(2) using sub- 
linear space. Also, our lower bound on the space complexity of Dist(1), 
together with Algorithm A for Dist(2), imply a space complexity sepa- 
ration result between one-pass and two-passes. For example, Dist(2) can 
be solved in space 0{m/n) = o{m) for graphs with T3 = n/2 triangles 
and m edges, while Dist(1) requires i7(?7i)-space for such graphs. 
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Remark 2. Algorithm A assumes m is given. This assumption is done 
only for the sake of clear and simple presentation, and can be easily re- 
moved: The algorithm "guesses" an initial value for m, say mi = 1. This 
value is used to define p for the first mi edges. If the number of edges 
exceeds that guess, then the algorithm sets m,2 = 2m,i, and updates p 
accordingly for the next m,2 edges. Every time guess i is exceeded, the 
algorithm sets ?7ij+i = 2mj. The last interval will consist of the last m/2 
edges. Edges are still stored independently of each other, and in expecta- 
tion twice as many edges are stored. Storing more edges may only help the 
algorithm (while not changing the asymptotic space complexity). Hence 
the same analysis that we have for A goes through with this additional 
procedure. 

1.2 A new graph parameter 

While the bound given in Theorem [1] for Triangles(I) is asymptotically 
tight, we suspect that the bound in Theorem [2] for Triangles(0(1)) 
is not tight, and conjecture a tight bound instead. Define the triangle 
density of a graph G, p{G), to be the number of vertices that belong to 
some triangle in G. It is easy to see that (' ^ ) < p{G) < 3T (a clique 
or T disjoint triangles). 

Conjecture: The space complexity of Triangles(0(1)) and Dist(0(1)) 
is Q{m/p{G)). 

The lower bound in Theorem [2] is consistent with the case p{G) = 
0{T), and Algorithm A is consistent with p{G) = 0{T^'^). We describe a 
second algorithm that solves Dist(2) using 0{m/ p{G)) space, thus show- 
ing that one cannot hope for a tighter bound than the one stated in the 
conjecture. A formal description of the algorithm, a proof of correctness 
and analysis of its space complexity is given in Section [2l 

1.3 Techniques 

Theorem [1] is proven via a reduction from the index problem in commu- 
nication complexity, and for Theorem [2] we use a reduction from a variant 
of the set disjointness problem. The idea behind the proof of Theorem [3] 
is as follows. Consider the following natural and well-known graph spar- 
sification procedure. Given a graph G, store every edge, independently 
of the others, with probability p. Let H be the sparsified version of G. 
If G has at least T triangles, the expected number of triangles in H is 
p^T. Taking p = J7(T~^'^), the expected number of triangles in H is 
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i7(l), and the number of edges in H is 0{m/T^'^). The main question 
is how concentrated is that number? For example, think of two triangles 
sharing an edge. If this edge was not picked in H then both triangles will 
not show up in H. This phenomenon may translate into a large variance 
in the number of triangles in H. To solve this problem, we identify the 
graph structure responsible for large variance. More concretely, we call 
s triangles that share the same edge an s -tower. For a carefully chosen 
number s* = s*{p), one can show the following fact: If G has no ■s*-tower, 
then the variance is small and the number of triangles in H is close to 
the expectation. If there is an s*-tower, it is tall enough so that at least 
one floor survives (a floor is two edges that belong to the same triangle). 
In that case, in the second pass of A, the base of that tower is caught, 
and a triangle is detected. 

The algorithm suggested in [17] also uses the graph sparsification 
method. That algorithm computes the sparsified graph H, and checks 
if it contains a triangle. This approach does not allow control over the 
variance, and ultimately such an algorithm will fail unless it stores 0{m) 
of the edges (think of the case when all triangles are stacked in one 
tower, then unless p is constant, the base of the tower will be missing 
almost always). This is also consistent with the lower bound we have for 
Triangles (1) (as computing H can be done in one pass). Our algorithm 
addresses this issue exactly by trying to either catch a triangle, or catch 
a floor (which forces the second pass). 

Paper Organization. We proceed with the description of our second 
algorithm mentioned in Section 11.21 The proof of Theorem [3] follows in 
Section [3l The proofs of the lower bounds. Theorems [D and [H use rather 
standard techniques, though require some new insights. Both are given in 
full in Appendix iBl 

2 The second algorithm 

Theorem 4. Algorithm A2 solves DiST (2) using 0{m/ p[G)) hits of mem- 
ory in expectation. 

Proof. Let Z C V he the set of vertices in G that belong to some triangle. 
In our notation, the size of Z is p{G). The algorithm never fails if there 
are no triangles in G. Therefore let us consider the case where there are 
triangles in G. 

The algorithm A2 fails only if S" n Z = 0. Otherwise, S contains a 
vertex v that belongs to some triangle {v,u,w}, and in the first pass the 
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Algorithm A2{G,T,p{G)) 

Output: 1 if a triangle is detected, otherwise. 
Pass 1 

(a) Sample An/ p[G) vertices, uniformly at random. Let S" C 1/ be that 
set. 

(b) Store all edges in the steam that touch the set S. 

Pass 2 Check for every edge e if it completes a triangle with any of the 
stored edges. Output 1 iff such edge exists. 



algorithm stores all neighbors of v (and in particular the edges (f , u) and 
{v,w)). In the second pass the edge {u,w) will be considered and A2 will 
detect the triangle. Let us bound the probability of 5 n Z = 0. Let Ai 
be the event that the i^^ vertex chosen to be in S doesn't belong to Z. 
It is easy to see that the A-i's are negatively correlated (as there is no 
replacement). For every i, Pr[Ai] = 1 — p{G)/n. Therefore, 

Pr[S n Z = 0] = Pr[Ai A A2 A . . . A\s\] < (1 - p(G)/n)^"/''(^) < e'l 

Now let us compute the expected number of edges stored by A2- For 
the i*'* vertex in S, let Di be a random variable counting the degree 
of that vertex in G. Since the i^^ vertex is a uniformly random vertex, 
E[Z)j] = 2m /n (the average degree in G). The expected number of edges 
touching S is at most (using linearity of expectation) 



E 



'\s\ 



\S\ 

^E[A] = (4n//)(G))(2m/n) = 8m//)(G). 



i=l 



3 Proof of Theorem M 

We denote by B(n,p) the binomial random variable with parameters n 
and p, and expectation p = np. We shall use the following variant of 
the Chernoff bound, whose proof can be found in |1H p. 21]. Let ^{x) = 
(1 + x) ln(l + x) — X. 

Theorem 5. If X ^ B{n,p) and t >0 is some number, then 

Pr(X>p + t)< e-^^(*/'^) , Prix <p-t)< e^^'^^"*/'^) . 
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The algorithm A always answers correctly if the graph G has no trian- 
gles. Therefore, it suffices to bound the error probability when the graph 
G has at least T > 1 triangles. 

Let H be the graph in which each edge of the stream is included with 
probability p. Let Bi be the event that Algorithm A outputs FAIL or the 
wrong answer, B2 be the event that more than 5m' edges were stored in 
the first pass (causing the algorithm to output FAIL), and B3 the event 
that H has no triangles, and no edge of the stream completes a triangle 
in H. Then 

Pr[Bi] < Pr[B2] + Pr[^3|^2] < ^^[^2] + Pr[B3]/Pr[B^]. 

(in the last inequality we used the fact that for two events A, B, Pr[A\B] = 
Pr[A A B]/Pr[B] < Pr[A] / Pr[B]) . The number of edges stored by A 
is a binomial random variable with expectation mp = m' . In our case, 
m' > 4: we can assume w.l.o.g that m > n/2 (isolated vertices are never 
visible to the algorithm), and T always satisfies T < n'^/6, therefore 
m' = Qm/T^'^ > 4. Using Theorem [Sj the probability of storing more 
than 5m' edges is at most 1/50, hence Pr[;B2] < 1/50. In turn. 

It suffices to show that PrlB^] < 0.3, and then derive Pr[Bi] < 1/3, as 
required. 

We call s triangles that share the same edge an s-tower. Each pair of 
edges that belong to the same triangle is called a floor in the tower. Let 
T^ ^ T he the number of triangles in G. For p = m' /m = 6/T^'^, let 
^ = p^T^ = 2I6T3/T be the expected number of triangles in H and a'^ 
the variance (a is the standard deviation). 

Lemma 1. If G contains no tower with more than T^ floors, then a < 
110(T3/r)5/6. 

Before we prove this proposition we need the following lemma. 

Lemma 2. Let G be a graph with T^ triangles, having no tower with more 
than h floors. Let vr(G) he the number of pairs of triangles that share an 
edge. Then 7r(G) < 3T3/1/2. 

Proof. Observe that every pair of triangles that share an edge belongs to 
exactly one tower: If the pair belongs to two towers, then the two triangles 
share two edges, but then they are the same triangle. Every pair belongs 
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to at least one tower, since every such pair is a tower of height two. 
Therefore we can count the number of pairs sharing an edge, by counting 
the number of pairs of triangles in every tower. Let Oj be the number 
of towers with i floors. Using this notation, vr(G) = X]j=2^»(2)' ^^^^ 
observe that Yli=2 '^«^ — ^-^3- '^^^ ^^™ counts the number of triangles 
that belong to some tower, when every such triangle is accounted for at 
most three times (as it belongs to at most three different towers). Finally, 
we have 

h / .\ ^ h t ^ h ^ 

^(G) = X] "M 2 ) - 2 X]("*^) ■ ^ - 2 XI ^^^^^ ■h = -^aii< ?,T^h/2. 

i=2 ^ ^ i=2 i=2 1=2 

Proof. (Lemma [T]) Index the triangles in G by 1, 2, ... , T3. Let Ij be the 
indicator random variable which takes the value 1 if all three edges of 
triangle j belong to H. 

E[lj] = p^ Var[lj] = p3(l - p3) < p3_ 

In these notations, o"^ (the variance of the number of triangles in G') is 
given by 

a' = X;Var(l,) + J]Cov(l„l,), £var(l,) < T^p' = ^. 

i=l i<j i=l 

For two triangles that share no edge, Cov(lj, Ij) = E[ljlj] — E[lj]E[lj] = 
p^ — p^ = 0. Therefore we only need to go over triangles that share an 
edge. For every such pair, Cov(lj, Ij) = p^ — p^ < p^- By Lemma [2] with 

h = T^/^, there are at most l-STg pairs of triangle that share an edge. 
Hence, 

Ecov(i..ysi.rfV^i^^.(^)°". 

i<j ^ ^ 

To summarize, 

2 2I6T3 /278T3\^^^ /282T3\^^^ 

Taking the square root, we get the desired bound on a. 

Proposition 1. Conditioned on G not having a tower with more than 
Tg/^ floors, Pr[B3] < 0.26. 
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Proof. For a random variable X, with expectation // and standard de- 
viation a, Chebychev's inequality implies Pr[X = 0] < ( - I . The ex- 



pected number of triangles in i:f is /i = 2I6T3/T. The standard deviation 
a < 110(r3/r)5/6 (by Lemmad]). Therefore 

/no /tV^^V 

Pr[no triangles in H] < I ( — j < 0.26. 

2/3 
Next we turn to the case where G contains a tower with at least Tg 

floors. 

2/3 
Proposition 2. Conditioned on G having a tower with at least Tg 

floors, Pr[B3] < 0.001. 

2/3 
Proof. Fix a tower with at least Tg floors. Every floor belongs to H 

independently of the others with probability p"^. Therefore the expected 

number of floors that belong to H from that tower is 

Using Chernoff 's bound (second inequality of Theorem [5|) with /i = 36 
and t = 35, we get 

Pr[no floor from the tower belongs to H] < e-36/(-36/35) < ^-30 < qqq^ 

Finally, from Propositions [Hand [2] we get Pr[Bs] < 0.26 + 0.001 < 0.3. 
To complete the proof of Theorem [3l observe that the space complexity 
of A never exceeds 5m' which is 30m/T^'^. 
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A Proof of Theorem [T] 

The theorem is a direct corollary of the same lower bound, just for the 
problem Dist(1). Clearly, if one can approximate the number of triangles, 
one can distinguish between the case where there are no triangles, or at 
least T triangles. 

Proposition 3. 3ci,C2 > s.t. the space complexity o/Dist(1) is Q{m), 
when the input is an n-vertex graph with m G [cin,C2n^] edges. Further- 
more, this lower hound is true even if the graph has as many as 0.99n 
triangles. 
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In the rest of this section we prove Proposition [3l It suffices to consider 
only the case T = 0.99n (since the way DiST is defined, T is a lower 
bound on the number of triangles). 

Let iNDEXp be the following problem: Alice has a binary vector x of 
length p and Bob has an index i between 1 and p. Alice communicates 
with Bob (one way communication) in order to determine the value of 
x[i] with probability better than 1/2. The randomized communication 
complexity of this problem is f^{p) |6|12j . For a fixed function g(-), let 
Q{n,g{n)) be a family of graphs on n vertices with 0{g[n)) edges and 
0.99n triangles. Assume by contradiction that there exist g{-), and an 
algorithm A that solves Dist(1) for the graphs in Q{n, g{n)) using o{g{n)) 
memory. We shall use this algorithm to solve iNDEXp using o{p) bits of 
communication, deriving a contradiction to the established lower bound 
for that problem. 

Consider the following graph G(x, £). Let f{n) = g{n)/n, and let a 
be such that af{a) = p (assume w.l.o.g that /(a) < a, otherwise rename 
them). The vertex set of G consists of n vertices partitioned into three 
sets: V = X[JY[JZ,\X\=a,\Y\ = f{a), \Z\ = T. We require T = 0.99n, 
and a + f{a) + T = n, therefore n = {a + /(a))/(l - 0.99) = e{a). Let 
Xi be the i vertex in X (and similarly yi in Y, and Zj in Z). Define the 
edge set Ei as follows: the first /(a) entries in x determine the neighbors 
of xi in Y (we place the edge {xi,yj) iff x[j] = 1), the next /(a) entries 
determine the neighbors of X2, and so on. Define the edge set E2 as 
follows: let ei = {xi,yj) be the edge corresponding to Bob's index i in x; 
add 2T edges of the form {zr,Xi) and {zr,yj) for r = 1, . . . ,T. Finally, 
E{G) = EiU E2. Let m = \E{G)\. 

The graph G enjoys the following properties: (a) G G Q{n,g{n)): the 
number of edges m G is m = 0{p) = 0{af{a)), and since n = 0{a), 
m = 0{nf{n)) = 0{g{n)), (b) the graph G has T triangles if x[^] = 1 and 
no triangles otherwise. 

To solve iNDEXp, Alice feeds the algorithm A with Ei, records its 
memory tape. When finished, she sends it to Bob. Bob then feeds A with 
the edge set E2, and answers according to A. The correctness is now 
immediate: since G{x,i) G Q{n,g{n)) the algorithm A answers correctly 
with probability at least 2/3, and by property (6), this is also the correct 
answer for Index^. As for the communication complexity, the number of 
edges in G satisfies m = 0{p) (since x contains 0{p) ones, and Bob adds 
only 2T = 0{p) edges). Algorithm A uses o{m) bits of memory, therefore 
the data sent by Alice is of the order of o{m) = o{p). Contradiction is then 
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derived. Finally observe that g{n) can be arbitrary, therefore m = 0{g{n)) 
has the desired range [cin,C2'n?]. 

B Proof of Theorem [2] 

The theorem is a direct corollary of the same lower bound, just for the 
problem Dist(0(1)). Clearly, if one can approximate the number of tri- 
angles, one can distinguish between the case where there are no triangles, 
or at least T triangles. 

Proposition 4. The space complexity o/Dist(0(1)) is i7(m/max{T3, 1}), 
for input graphs with m edges and T3 triangles. 

In the rest of this section we prove Proposition SI Let DiSJp be the 
following problem: Alice and Bob have each a vector of length p with 
exactly r ones in each vector. Each vector is interpreted as the charac- 
teristic vector of a subset in {1,2, ... ,p}. Alice and Bob communicate 
in order to decide whether their sets intersect or not. Let us define the 
problem DiSJlJ'* to be the same as DiSJp just that now the intersection is 
promised to be either empty or of size at least t. Observe that the size of 
the intersection is also given by XliLi ^«yi (^lY ^^e their vectors). 

We first describe a reduction from DiSJp'* to Dist(0(1)), then estab- 
lish a lower bound on the communication complexity of DiSJ^*. Let x, y 
be two vectors in {0, 1}^ for p = n^, n an integer (w.l.o.g. we can as- 
sume n is an integer, since we can always pad the vectors x and y with 
zeros). Consider the following graph G* = G(x, y). The set of vertices 
V = AU B U C, each part of size n. Let Oj be the i vertex in A (and 
similarly define bi,Ci). We interpret the vector x as follows: the first n 
entries in x determine the neighbors of ai in B, (ai, bj) € E iS x[j] = 1. 
In the same way, entries [{i — l)n,in — 1], for i = 2,...,n determine 
the neighbors of Oj. Similarly, y determines the neighbors of q in B for 
i = 1, . . . , n. In addition, we have the following set of n edges: (oj, Cj) G E 
for z = 1, . . . , n (a perfect matching on A and C). 

Lemma 3. The graph G* has T triangles iff ^ x^yj = T. 

Proof. Consider a triangle in G* , it must contain exactly one edge of the 
perfect matching, {ai,Ci). To complete the triangle there must be two 
additional edges {ai,bk),{ci,bk). This however implies that X(i-i)n+k = 
y{i-i)n+k = 1- If on the other hand xj = y^ = 1 for t = (i — l)n + k 
for i,k G [n], then the edges {ai,bk), ici,bk) are present in G*. Therefore 
together with the edge (aj, q) they induce a triangle in G*. 
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Lemma 4. The communication complexity of DiSJp is f2{r/t) for any 

r < p/2. 

Proof. Assume by contradiction that the communication complexity of 
DiSJp'* is o{r/t). Let r' = r/t, and consider the problem Disjl . Given two 
vectors x', y' of size r', we construct the vector x by taking t concatenated 
copies of x'. Similarly construct y. Clearly, if x',y' intersect then x,y have 
intersection size at least t, and if x',y' are disjoint so are x, y. We can 
then solve DiSJp using o[r/t) = o{r') bits of communication by reducing 
to DiSJp'*. This however contradicts the lower bound established in [10] 
on DlSJp. 

Proposition 5. // there exists an algorithm, A that solves Dist(0(1)) 
using o{m/T) hits of memory, then there exists an algorithm A* that 
solves DiSJp* using o{r/t) bits of communication whenever r = i7(-y/p). 

Proof. We describe the algorithm A* for DiSJp*. Alice has the vector x 
and Bob has y . Alice runs A on the stream of edges of G* that include the 
matching edges and the edges induced by x. She then sends the content 
of her memory to Bob, who continues to run ^, while feeding it the 
edges induced by y. At the end. Bob sends the content of his memory to 
Alice, and this repeats for the number of passes that A requires (which 
is constant). They answer 'Disjoint' iff A outputs 0. The correctness of 
the algorithm comes from Lemma [3) if x and y are disjoint, then G* 
has no triangles, and A outputs with probability at least 2/3. If x and 
y intersect, then the intersection size is at least t, and hence G has at 
least T = t triangles. Accordingly, A outputs 1 with probability at least 
2/3. The communication complexity of A* is the same, up to a constant 
factor, as the memory used by A (as only the memory content is being 
transmitted). The number of edges m in G* is 0{r): n matching edges, 
and 2r edges coming from x and y. Since r = f2(y/p) = Q{n) we have 
m = 0{r). The number of triangles in G* is T = t, therefore. A* solves 
DiSJp* using o{m/T) = o{r/t) bits of communication. 

Proposition m follows from Lemma U] and Proposition [5j 



